Goto

Collaborating Authors

 analytic model


Machine learning-guided construction of an analytic kinetic energy functional for orbital free density functional theory

Manzhos, Sergei, Luder, Johann, Ihara, Manabu

arXiv.org Machine Learning

Machine learning (ML) of kinetic energy functionals (KEF) for orbital-free density functional theory (OF-DFT) holds the promise of addressing an important bottleneck in large-scale ab initio materials modeling where sufficiently accurate analytic KEFs are lacking. However, ML models are not as easily handled as analytic expressions; they need to be provided in the form of algorithms and associated data. Here, we bridge the two approaches and construct an analytic expression for a KEF guided by interpretative machine learning of crystal cell-averaged kinetic energy densities ({\tau}) of several hundred materials. A previously published dataset including multiple phases of 433 unary, binary, and ternary compounds containing Li, Al, Mg, Si, As, Ga, Sb, Na, Sn, P, and In was used for training, including data at the equilibrium geometry as well as strained structures. A hybrid Gaussian process regression - neural network (GPR-NN) method was used to understand the type of functional dependence of {\tau} on the features which contained cell-averaged terms of the 4th order gradient expansion and the product of the electron density and Kohn-Sham effective potential. Based on this analysis, an analytic model is constructed that can reproduce Kohn-Sham DFT energy-volume curves with sufficient accuracy (pronounced minima that are sufficiently close to the minima of the Kohn-Sham DFT-based curves and with sufficiently close curvatures) to enable structure optimizations and elastic response calculations.


The Physics-Informed Neural Network Gravity Model: Generation III

Martin, John, Schaub, Hanspeter

arXiv.org Artificial Intelligence

Scientific machine learning and the advent of the Physics-Informed Neural Network (PINN) show considerable potential in their capacity to identify solutions to complex differential equations. Over the past two years, much work has gone into the development of PINNs capable of solving the gravity field modeling problem -- i.e.\ learning a differentiable form of the gravitational potential from position and acceleration estimates. While the past PINN gravity models (PINN-GMs) have demonstrated advantages in model compactness, robustness to noise, and sample efficiency; there remain key modeling challenges which this paper aims to address. Specifically, this paper introduces the third generation of the Physics-Informed Neural Network Gravity Model (PINN-GM-III) which solves the problems of extrapolation error, bias towards low-altitude samples, numerical instability at high-altitudes, and compliant boundary conditions through numerous modifications to the model's design. The PINN-GM-III is tested by modeling a known heterogeneous density asteroid, and its performance is evaluated using seven core metrics which showcases its strengths against its predecessors and other analytic and numerical gravity models.


Machine Learning Tutorial with Python, Jupyter, KSQL and TensorFlow

#artificialintelligence

When Michelangelo started, the most urgent and highest impact use cases were some very high scale problems, which led us to build around Apache Spark (for large-scale data processing and model training) and Java (for low latency, high throughput online serving). This structure worked well for production training and deployment of many models but left a lot to be desired in terms of overhead, flexibility, and ease of use, especially during early prototyping and experimentation [where Notebooks and Python shine]. Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo.


PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences

Boukhers, Zeyd, Bleier, Arnim, Yediel, Yeliz Ucer, Hienstorfer-Heitmann, Mio, Jaberansary, Mehrshad, Koumpis, Adamantios, Beyan, Oya

arXiv.org Artificial Intelligence

Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.


What's The Difference Between BI Analyst and Data Scientist?

#artificialintelligence

This is still the #1 question I get from many data warehouse and business intelligence folks. I use to show Figure 1 (BI Analyst vs. Data Scientist Characteristics chart, which shows the different attitudinal approaches for each) and Figure 2 (Business Intelligence vs. Data Science, which shows the different types of questions that each tries to address) in response to this question. However, these slides lack the context required to satisfactorily answer the question – I'm never sure the audience really understands the inherent differences between what a BI analyst does and what a data scientist does. The key is to understand the differences between the BI analyst's and data scientist's goals, tools, techniques and approaches. Figure 3 outlines the high-level analytic process that a typical BI Analyst uses when engaging with the business users.


Making AI accountable: Blockchain, governance, and auditability

#artificialintelligence

The past few years have brought much hand wringing and arm waving about artificial intelligence (AI), as business people and technologists alike worry about the outsize decisioning power they believe these systems to have. As a data scientist, I am accustomed to being the voice of reason about the possibilities and limitations of AI. In this article I'll explain how companies can use blockchain technology for model development governance, a breakthrough to better understand AI, make the model development process auditable, and identify and assign accountability for AI decisioning. While there is widespread awareness about the need to govern AI, the discussion about how to do so is often nebulous, such as in "How to Build Accountability into Your AI" in Harvard Business Review: A healthy ecosystem for managing AI must include governance processes and structures.... Accountability for AI means looking for solid evidence of governance at the organizational level, including clear goals and objectives for the AI system; well-defined roles, responsibilities, and lines of authority; a multidisciplinary workforce capable of managing AI systems; a broad set of stakeholders; and risk-management processes. Additionally, it is vital to look for system-level governance elements, such as documented technical specifications of the particular AI system, compliance, and stakeholder access to system design and operation information.


What Movies Can Teach Us About Prospering in an AI World – Part 1 - DataScienceCentral.com

#artificialintelligence

In his book Outliers, Malcom Gladwell unveils the "10,000-Hour Rule" which postulates that the key to achieving world-class mastery of a skill is a matter of 10,000 hours of practice or learning. And while there may be disagreement on the actual number of hours (though I did hear my basketball coaches yell that at me about 10,000 times), let s say that we can accept that it requires roughly 10,000 hours of practice and learning exploring, trying, failing, learning, exploring again, trying again, failing again, learning again for one to master a skill. If that is truly the case, then dang, us humans are doomed. Think about 1,000,000 Tesla cars with its Fully Self Driving (FSD) autonomous driving module practicing and learning every hour that it is driving. In a single hour of the day, Tesla s FSD driving module is learning 100x more than what Malcom Gladwell postulates is necessary to master a task.


Predictive maintenance in industry 4.0: applications and advantages

#artificialintelligence

Machines play a huge role in our lives, including the machines we use every day, but without maintenance, every machine will eventually break down. Companies follow various maintenance programs to increase operational reliability and reduce costs. Maintenance is the set of operations necessary to preserve the functionality and efficiency of an asset and can take place in response to a failure or as a previously planned action. According to research conducted by Deloitte, a non-optimized maintenance strategy can reduce the production capacity of an industrial plant by 5 to 20%. Recent studies also show that downtime costs industrial manufacturers about 45 billion euros a year.


4 Ways That Your Accurate Model May Not Be Good Enough

#artificialintelligence

When we were in school and were given a problem to solve, we usually stopped working on the problem as soon as we found the answer and we recorded that answer on our paper. This might be a fair approach for elementary school assignments, but that approach is not good in higher education or in life. Unfortunately, many people continue this learned behavior into adulthood, at the university and/or on their jobs. Consequently, these people miss new opportunities for learning, discovery, recognition, and advancement. In data science, we are trained to keep searching (at least, I hope that this is true for all of us) even after we find that first model from our data that appears to answer our business question accurately.


Operationalizing AI models for Digital Twin Inititatives

#artificialintelligence

Digital twin technology continues to be adopted by manufacturing industries to support business strategy and gain efficiencies in operations and customer service. Definition: Digital twin refers to a digital replica of potential and actual physical assets (physical twin), processes, people, places, systems and devices that can be used for various purposes. Digital twins are used for a wide variety of use cases. Manufacturers use digital twins to help them reduce maintenance costs on machinery and optimize production output. For example, by analyzing and experimenting with the virtual copy, manufacturers don't have to take down physical operations to test and implement updates.